Data Challenge¶
The Idea¶
One of my hobbies is baking. I love to bake cakes, cookies, tarts, and all those delicious treats. This hobby sparked my idea for this challenge. We all know about the apple, butter, carrot, or milk in the back of the fridge that is about to spoil. You can still eat it, but it doesn't look appetizing, or it's too much to use in a short amount of time. For this, I thought it would be a good challenge to create an app that could identify the ingredients you upload with a picture and use that list of ingredients to find a recipe so you don't have to throw those ingredients away.
Approach¶
My approach to this challenge is to start with a CNN model that can identify the ingredients in a picture that you upload. From this model, an ingredient list will be made. This list will then be used in an NLP model to find a recipe that you could make to use those ingredients. For this, I spent some time finding images I can use to create this CNN model. I found a dataset that seems to be good enough for now at Roboflow.
After having found the data I started with loading the images and research how to best train a model to detect the ingredients in the dataset.
Loading the packages¶
Starting of with loading the packages that i am going to use in the notebook.
import os
import cv2
import numpy as np
import matplotlib.pyplot as plt
import random
import pandas as pd
import tensorflow as tf
from collections import Counter
from tensorflow.keras import models, layers
from tensorflow.keras.applications import MobileNetV2
from tensorflow.keras.applications.vgg16 import preprocess_input
Loading the data¶
Here the data will be loaded. This data is already labeled using a bounding box method.
dataset_root = "FridgeDetection_data"
splits = ["train", "test", "valid"]
def load_data(split):
image_dir = os.path.join(dataset_root, split, "images")
label_dir = os.path.join(dataset_root, split, "labelTxt")
images, labels = [], []
for file in os.listdir(image_dir):
if file.lower().endswith((".jpg", ".png")):
img_path = os.path.join(image_dir, file)
label_path = os.path.join(label_dir, os.path.splitext(file)[0] + ".txt")
images.append(cv2.imread(img_path))
labels.append(open(label_path).read() if os.path.exists(label_path) else None)
return images, labels
data = {split: dict(zip(["images", "labels"], load_data(split))) for split in splits}
# Print summary
for split in splits:
print(f"{split.capitalize()}: Loaded {len(data[split]['images'])} images and {len(data[split]['labels'])} labels.")
Train: Loaded 1521 images and 1521 labels. Test: Loaded 73 images and 73 labels. Valid: Loaded 145 images and 145 labels.
The dataset consists of 1521 images for training, 73 images for validation, and 145 images for testing.
Data Understanding¶
The first thing to look at is how the images look when loaded with the labels.
def plot_images_with_labels(images, labels, title, num_images=10, max_columns=2):
combined = list(zip(images, labels))
random.shuffle(combined)
images, labels = zip(*combined[:num_images])
num_rows = (len(images) + max_columns - 1) // max_columns
fig, axes = plt.subplots(num_rows, max_columns, figsize=(10, 5 * num_rows))
axes = axes.flatten() if num_images > 1 else [axes]
for ax, img, label_text in zip(axes, images, labels):
img = img.copy()
if label_text:
for line in label_text.strip().splitlines():
parts = line.split()
if len(parts) >= 9:
coords = list(map(int, parts[:8]))
label = parts[8]
pts = np.array(coords).reshape((4, 2))
cv2.polylines(img, [pts], isClosed=True, color=(0, 0, 255), thickness=2)
cv2.putText(img, label, tuple(pts[0]), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 255), 1)
ax.imshow(cv2.cvtColor(img, cv2.COLOR_BGR2RGB))
ax.axis("off")
# Hide unused subplots
for ax in axes[len(images):]:
ax.axis("off")
fig.suptitle(title, fontsize=16)
plt.subplots_adjust(top=0.92, hspace=0.3)
plt.show()
# Example usage
plot_images_with_labels(data['train']['images'], data['train']['labels'], "Train Set with Labels")
plot_images_with_labels(data['valid']['images'], data['valid']['labels'], "Valid Set with Labels")
plot_images_with_labels(data['test']['images'], data['test']['labels'], "Test Set with Labels")
As you can see above, the images have one or more labels that are annotated using bounding boxes. Additionally, there are labels that are not useful for the tool I aim to create. These include labels like bags, baskets, trays, and so on. In the next step, I will analyze the data to identify more of these types of labels.
# Function to count labels
def count_labels(data):
label_counts = Counter()
for split in data:
for label_data in data[split]['labels']:
if label_data:
for line in label_data.splitlines():
parts = line.split()
if len(parts) >= 9:
label = parts[8]
label_counts[label] += 1
return label_counts
# Count labels across all splits
label_counts = count_labels(data)
# Print the counts
for label, count in label_counts.items():
print(f"{label}: {count}")
crackers: 189 sausages: 359 sprite: 186 chocolate_drink: 363 coke: 218 orange: 222 apple: 169 paprika: 238 noodles: 191 cereal: 231 grape_juice: 254 basket: 92 orange_juice: 210 scrubby: 238 sponge_opl: 200 cloth_opl: 92 potato_chips: 191 pringles: 213 potato: 251 onion: 344 garlic: 224 butter: 152 eggs: 347 tomato: 199 lemon: 206 tray: 98 help_me_carry_opl: 37
So the data does not have as much variance in the ingredients as I hoped. But for at least the first prototype, I think it will have to do. Other than that, there are labels like:
- basket
- scrubby
- sponge_opl
- cloth_opl
- tray
- help_me_carry_opl
These labels do not contribute to the goal of this tool. So I will be looking into removing them from the dataset.
Next, we will be looking into the shape of the images.
# Check the size of the first image in the training set
image_shape = data['train']['images'][0].shape
print(f"Image size: {image_shape}")
Image size: (640, 640, 3)
As you can see, the shape of the images is (640, 640, 3). When making the first model, I ran into the issue that the images were too large for the resources of my computer. Therefore, I will resize them to (224, 224, 3) in the data preparation step.
Prepropecessing¶
I started off by making my job a bit easier and naming the different sets as X_train, y_train, and so on.
X_train = data['train']['images']
X_val = data['valid']['images']
y_train = data['train']['labels']
y_val = data['valid']['labels']
X_test = data['test']['images']
y_test = data['test']['labels']
Then I went on to resize the images so that my computer could actually train a model on them.
def resize(img):
# Convert to RGB
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
# Normalize pixel values to [0, 1]
img = img.astype("float32") / 255.0
# Clip values to ensure they are within the valid range [0, 1] for floats
img = np.clip(img, 0.0, 1.0)
# Resize image
img = cv2.resize(img, (224, 224), interpolation=cv2.INTER_CUBIC)
return img
# Apply preprocessing to all sets
X_train = np.array([resize(img) for img in X_train])
X_val = np.array([resize(img) for img in X_val])
X_test = np.array([resize(img) for img in X_test])
image_shape = X_train[0].shape
print(f"Image size: {image_shape}")
Image size: (224, 224, 3)
Now that the images are resized, I also have to resize the labels to ensure that they are still placed in the accurate spots.
def resize_labels(labels, original_size, new_size):
resized_labels = []
scale_x = new_size[0] / original_size[0]
scale_y = new_size[1] / original_size[1]
for label_data in labels:
if label_data:
resized_label_data = []
for line in label_data.splitlines():
parts = line.split()
if len(parts) >= 9:
x1, y1, x2, y2, x3, y3, x4, y4, label = parts[:9]
coords = [int(int(p) * scale_x if i % 2 == 0 else int(p) * scale_y) for i, p in enumerate([x1, y1, x2, y2, x3, y3, x4, y4])]
resized_label_data.append(f"{coords[0]} {coords[1]} {coords[2]} {coords[3]} {coords[4]} {coords[5]} {coords[6]} {coords[7]} {label}")
resized_labels.append("\n".join(resized_label_data))
else:
resized_labels.append(None)
return resized_labels# Example usage
original_size = (640, 640) # Assuming original images are 640x640
new_size = (224, 224) # Resized images are 224x224
# Resize labels for train, validation, and test sets
resized_train_labels = resize_labels(y_train, original_size, new_size)
resized_val_labels = resize_labels(y_val, original_size, new_size)
resized_test_labels = resize_labels(y_test, original_size, new_size)
# Print a few examples to verify
print("Original label:", y_train[0])
print("Resized label:", resized_train_labels[0])
Original label: 121 107 522 107 522 418 121 418 crackers 0 Resized label: 42 37 182 37 182 146 42 146 crackers
Now that I have resized the labels, I am hoping they were resized correctly. We will know more if we plot the images with the labels again.
def plot_images_with_labels(images, labels, title, num_images=10, max_columns=2):
# Shuffle images and labels together
combined = list(zip(images, labels))
random.shuffle(combined)
images, labels = zip(*combined)
num_rows = (min(num_images, len(images)) + max_columns - 1) // max_columns
plt.figure(figsize=(10, 5 * num_rows))
for i in range(min(num_images, len(images))):
img = images[i].copy() # Make a copy of the image to avoid modifying the original image
if labels[i]:
for line in labels[i].splitlines():
parts = line.split()
if len(parts) >= 9:
# Extract the coordinates and label
x1, y1, x2, y2, x3, y3, x4, y4, label = parts[:9]
# Convert the coordinates to integers
x1, y1, x2, y2, x3, y3, x4, y4 = map(int, [x1, y1, x2, y2, x3, y3, x4, y4])
# Draw the label text
cv2.putText(img, label, (x1, y1 - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 255), 1)
# Draw the bounding box polygon
cv2.polylines(img, [np.array([[x1, y1], [x2, y2], [x3, y3], [x4, y4]], np.int32)],
isClosed=True, color=(0, 0, 255), thickness=2)
# Plot the image
plt.subplot(num_rows, max_columns, i + 1)
plt.imshow(cv2.cvtColor(img, cv2.COLOR_BGR2RGB)) # Convert from BGR to RGB for displaying
plt.axis("off")
# Set title and show the plot
plt.suptitle(title)
plt.show()
# Plot 10 random images from the train set with labels
plot_images_with_labels(X_train, resized_train_labels, "Train Set with Labels")
# Plot 10 random images from the validation set with labels
plot_images_with_labels(X_val, resized_val_labels, "Valid Set with Labels")
y_train = resized_train_labels
y_val = resized_val_labels
y_test = resized_test_labels
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers). Got range [-0.119539365..255.0]. Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers). Got range [-0.08916003..255.0]. Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers). Got range [-0.085442126..255.0]. Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers). Got range [-0.04811016..255.0]. Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers). Got range [-0.0771409..255.0]. Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers). Got range [-0.07763342..255.0]. Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers). Got range [-0.042792056..255.0]. Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers). Got range [-0.05978996..255.0]. Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers). Got range [-0.08116023..255.0]. Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers). Got range [-0.06654761..255.0].
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers). Got range [0.0..255.0]. Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers). Got range [-0.0012927137..255.0]. Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers). Got range [-0.01585095..255.0]. Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers). Got range [-0.00019228375..255.0]. Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers). Got range [-0.055915732..255.0]. Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers). Got range [-0.027472967..255.0]. Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers). Got range [0.0..255.0]. Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers). Got range [-0.015794247..255.0]. Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers). Got range [0.0..255.0]. Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers). Got range [-0.0020304485..255.0].
The labels and images seem to be resized correctly. However, I have a feeling that the colors might have been slightly altered during the preprocessing step. I have, however, decided to leave the non-ingredient labels and this color problem for later so I can test a few initial CNN models.
Next, I will be encoding the labels into a multi-hot format for modeling.
# Preprocess labels into multi-hot encoded format
def encode_labels(labels, label_counts):
label_to_index = {label: idx for idx, label in enumerate(label_counts.keys())}
num_classes = len(label_counts)
encoded_labels = []
for label_data in labels:
multi_hot = np.zeros(num_classes, dtype=np.float32)
if label_data:
for line in label_data.splitlines():
parts = line.split()
if len(parts) >= 9:
label = parts[8]
if label in label_to_index:
multi_hot[label_to_index[label]] = 1.0
encoded_labels.append(multi_hot)
return np.array(encoded_labels)
# Apply preprocessing to train, validation, and test labels
y_train_encoded = encode_labels(y_train, label_counts)
y_val_encoded = encode_labels(y_val, label_counts)
y_test_encoded = encode_labels(y_test, label_counts)
Moving on to the modeling section.
Modeling¶
I started with creating my own model just to see what would happen. I expected no decent results since classifying multiple categories in one image would require a more complex approach than what I implemented here.
# Define the number of classes based on the label_counts
num_classes = len(label_counts)
model = models.Sequential([
layers.Input(shape=(224, 224, 3)),
layers.Conv2D(32, 3, activation='relu'),
layers.MaxPooling2D(),
layers.Conv2D(64, 3, activation='relu'),
layers.MaxPooling2D(),
layers.Conv2D(128, 3, activation='relu'),
layers.MaxPooling2D(),
layers.Flatten(),
layers.Dense(128, activation='relu'),
layers.Dense(num_classes, activation='sigmoid') # sigmoid for multi-label
])
model.compile(optimizer='adam',
loss='binary_crossentropy',
metrics=['accuracy'])
# Fit the model with preprocessed labels
history = model.fit(X_train, y_train_encoded, epochs=50, batch_size=32, validation_data=(X_val, y_val_encoded))
Epoch 1/50 48/48 [==============================] - 3s 55ms/step - loss: 0.3493 - accuracy: 0.0769 - val_loss: 0.3150 - val_accuracy: 0.0621 Epoch 2/50 48/48 [==============================] - 2s 51ms/step - loss: 0.2937 - accuracy: 0.1256 - val_loss: 0.2687 - val_accuracy: 0.1172 Epoch 3/50 48/48 [==============================] - 2s 51ms/step - loss: 0.2474 - accuracy: 0.1887 - val_loss: 0.2364 - val_accuracy: 0.1379 Epoch 4/50 48/48 [==============================] - 2s 52ms/step - loss: 0.1880 - accuracy: 0.3110 - val_loss: 0.2495 - val_accuracy: 0.2207 Epoch 5/50 48/48 [==============================] - 2s 51ms/step - loss: 0.1177 - accuracy: 0.4918 - val_loss: 0.3202 - val_accuracy: 0.2000 Epoch 6/50 48/48 [==============================] - 2s 52ms/step - loss: 0.0627 - accuracy: 0.6003 - val_loss: 0.3445 - val_accuracy: 0.2069 Epoch 7/50 48/48 [==============================] - 2s 52ms/step - loss: 0.0252 - accuracy: 0.6647 - val_loss: 0.4055 - val_accuracy: 0.2207 Epoch 8/50 48/48 [==============================] - 2s 52ms/step - loss: 0.0115 - accuracy: 0.6890 - val_loss: 0.4542 - val_accuracy: 0.2138 Epoch 9/50 48/48 [==============================] - 3s 53ms/step - loss: 0.0072 - accuracy: 0.6811 - val_loss: 0.5365 - val_accuracy: 0.2276 Epoch 10/50 48/48 [==============================] - 3s 52ms/step - loss: 0.0045 - accuracy: 0.6963 - val_loss: 0.5354 - val_accuracy: 0.1724 Epoch 11/50 48/48 [==============================] - 3s 52ms/step - loss: 0.0019 - accuracy: 0.6923 - val_loss: 0.5917 - val_accuracy: 0.2069 Epoch 12/50 48/48 [==============================] - 3s 53ms/step - loss: 0.0015 - accuracy: 0.6903 - val_loss: 0.6072 - val_accuracy: 0.1793 Epoch 13/50 48/48 [==============================] - 3s 53ms/step - loss: 4.3843e-04 - accuracy: 0.6936 - val_loss: 0.6543 - val_accuracy: 0.2138 Epoch 14/50 48/48 [==============================] - 3s 53ms/step - loss: 1.8005e-04 - accuracy: 0.6943 - val_loss: 0.6805 - val_accuracy: 0.2138 Epoch 15/50 48/48 [==============================] - 3s 53ms/step - loss: 9.8531e-05 - accuracy: 0.7068 - val_loss: 0.7015 - val_accuracy: 0.2138 Epoch 16/50 48/48 [==============================] - 3s 53ms/step - loss: 7.1389e-05 - accuracy: 0.6949 - val_loss: 0.7203 - val_accuracy: 0.2138 Epoch 17/50 48/48 [==============================] - 3s 53ms/step - loss: 5.8388e-05 - accuracy: 0.6930 - val_loss: 0.7323 - val_accuracy: 0.2138 Epoch 18/50 48/48 [==============================] - 3s 54ms/step - loss: 4.9544e-05 - accuracy: 0.6936 - val_loss: 0.7434 - val_accuracy: 0.2138 Epoch 19/50 48/48 [==============================] - 3s 53ms/step - loss: 4.2591e-05 - accuracy: 0.6910 - val_loss: 0.7545 - val_accuracy: 0.2276 Epoch 20/50 48/48 [==============================] - 3s 53ms/step - loss: 3.7319e-05 - accuracy: 0.6917 - val_loss: 0.7642 - val_accuracy: 0.2207 Epoch 21/50 48/48 [==============================] - 3s 54ms/step - loss: 3.3009e-05 - accuracy: 0.6910 - val_loss: 0.7744 - val_accuracy: 0.2276 Epoch 22/50 48/48 [==============================] - 3s 54ms/step - loss: 2.9339e-05 - accuracy: 0.6897 - val_loss: 0.7839 - val_accuracy: 0.2276 Epoch 23/50 48/48 [==============================] - 3s 54ms/step - loss: 2.6343e-05 - accuracy: 0.6884 - val_loss: 0.7902 - val_accuracy: 0.2345 Epoch 24/50 48/48 [==============================] - 3s 56ms/step - loss: 2.3820e-05 - accuracy: 0.6910 - val_loss: 0.7978 - val_accuracy: 0.2345 Epoch 25/50 48/48 [==============================] - 3s 55ms/step - loss: 2.1637e-05 - accuracy: 0.6890 - val_loss: 0.8065 - val_accuracy: 0.2345 Epoch 26/50 48/48 [==============================] - 3s 56ms/step - loss: 1.9710e-05 - accuracy: 0.6897 - val_loss: 0.8115 - val_accuracy: 0.2345 Epoch 27/50 48/48 [==============================] - 3s 55ms/step - loss: 1.8074e-05 - accuracy: 0.6897 - val_loss: 0.8186 - val_accuracy: 0.2345 Epoch 28/50 48/48 [==============================] - 3s 55ms/step - loss: 1.6605e-05 - accuracy: 0.6903 - val_loss: 0.8257 - val_accuracy: 0.2345 Epoch 29/50 48/48 [==============================] - 3s 55ms/step - loss: 1.5301e-05 - accuracy: 0.6910 - val_loss: 0.8314 - val_accuracy: 0.2345 Epoch 30/50 48/48 [==============================] - 3s 55ms/step - loss: 1.4154e-05 - accuracy: 0.6890 - val_loss: 0.8369 - val_accuracy: 0.2345 Epoch 31/50 48/48 [==============================] - 3s 54ms/step - loss: 1.3139e-05 - accuracy: 0.6903 - val_loss: 0.8430 - val_accuracy: 0.2345 Epoch 32/50 48/48 [==============================] - 3s 55ms/step - loss: 1.2203e-05 - accuracy: 0.6903 - val_loss: 0.8486 - val_accuracy: 0.2276 Epoch 33/50 48/48 [==============================] - 3s 55ms/step - loss: 1.1368e-05 - accuracy: 0.6917 - val_loss: 0.8531 - val_accuracy: 0.2345 Epoch 34/50 48/48 [==============================] - 3s 55ms/step - loss: 1.0593e-05 - accuracy: 0.6897 - val_loss: 0.8582 - val_accuracy: 0.2414 Epoch 35/50 48/48 [==============================] - 3s 54ms/step - loss: 9.9063e-06 - accuracy: 0.6903 - val_loss: 0.8630 - val_accuracy: 0.2276 Epoch 36/50 48/48 [==============================] - 3s 54ms/step - loss: 9.2752e-06 - accuracy: 0.6917 - val_loss: 0.8680 - val_accuracy: 0.2345 Epoch 37/50 48/48 [==============================] - 3s 54ms/step - loss: 8.7000e-06 - accuracy: 0.6917 - val_loss: 0.8732 - val_accuracy: 0.2276 Epoch 38/50 48/48 [==============================] - 3s 54ms/step - loss: 8.1690e-06 - accuracy: 0.6936 - val_loss: 0.8769 - val_accuracy: 0.2276 Epoch 39/50 48/48 [==============================] - 3s 54ms/step - loss: 7.6836e-06 - accuracy: 0.6936 - val_loss: 0.8823 - val_accuracy: 0.2276 Epoch 40/50 48/48 [==============================] - 3s 55ms/step - loss: 7.2298e-06 - accuracy: 0.6930 - val_loss: 0.8866 - val_accuracy: 0.2276 Epoch 41/50 48/48 [==============================] - 3s 54ms/step - loss: 6.8198e-06 - accuracy: 0.6923 - val_loss: 0.8908 - val_accuracy: 0.2345 Epoch 42/50 48/48 [==============================] - 3s 55ms/step - loss: 6.4368e-06 - accuracy: 0.6936 - val_loss: 0.8944 - val_accuracy: 0.2345 Epoch 43/50 48/48 [==============================] - 3s 55ms/step - loss: 6.0766e-06 - accuracy: 0.6930 - val_loss: 0.8990 - val_accuracy: 0.2345 Epoch 44/50 48/48 [==============================] - 3s 55ms/step - loss: 5.7440e-06 - accuracy: 0.6923 - val_loss: 0.9035 - val_accuracy: 0.2345 Epoch 45/50 48/48 [==============================] - 3s 54ms/step - loss: 5.4327e-06 - accuracy: 0.6930 - val_loss: 0.9066 - val_accuracy: 0.2345 Epoch 46/50 48/48 [==============================] - 3s 55ms/step - loss: 5.1440e-06 - accuracy: 0.6943 - val_loss: 0.9112 - val_accuracy: 0.2345 Epoch 47/50 48/48 [==============================] - 3s 55ms/step - loss: 4.8778e-06 - accuracy: 0.6930 - val_loss: 0.9162 - val_accuracy: 0.2345 Epoch 48/50 48/48 [==============================] - 3s 55ms/step - loss: 4.6291e-06 - accuracy: 0.6923 - val_loss: 0.9189 - val_accuracy: 0.2345 Epoch 49/50 48/48 [==============================] - 3s 55ms/step - loss: 4.3967e-06 - accuracy: 0.6936 - val_loss: 0.9226 - val_accuracy: 0.2345 Epoch 50/50 48/48 [==============================] - 3s 55ms/step - loss: 4.1790e-06 - accuracy: 0.6936 - val_loss: 0.9272 - val_accuracy: 0.2345
# Plot accuracy and loss graphs
def plot_training_history(history):
# Extract accuracy and loss values
acc = history.history['accuracy']
val_acc = history.history['val_accuracy']
loss = history.history['loss']
val_loss = history.history['val_loss']
epochs = range(1, len(acc) + 1)
# Plot accuracy
plt.figure(figsize=(12, 5))
plt.subplot(1, 2, 1)
plt.plot(epochs, acc, 'b', label='Training Accuracy')
plt.plot(epochs, val_acc, 'r', label='Validation Accuracy')
plt.title('Training and Validation Accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend()
# Plot loss
plt.subplot(1, 2, 2)
plt.plot(epochs, loss, 'b', label='Training Loss')
plt.plot(epochs, val_loss, 'r', label='Validation Loss')
plt.title('Training and Validation Loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
plt.tight_layout()
plt.show()
# Call the function with the history object
plot_training_history(history)
The results were as expected: not good. The model showed very low accuracy and a high loss on the validation set.
Transfer Learning¶
Now I am going to create a model using transfer learning.
base_model = MobileNetV2(input_shape=(224, 224, 3), include_top=False, weights='imagenet')
base_model.trainable = False # Freeze weights
model = models.Sequential([
base_model,
layers.GlobalAveragePooling2D(),
layers.Dense(128, activation='relu'),
layers.Dropout(0.5),
# layers.Dense(64, activation='relu'),
# layers.Dropout(0.5),
# layers.Dense(32, activation='relu'),
# layers.Dropout(0.5),
layers.Dense(num_classes, activation='sigmoid')
])
model.compile(optimizer='adam',
loss='binary_crossentropy',
metrics=['accuracy'])
# Fit the model with preprocessed labels
callbacks = [
tf.keras.callbacks.EarlyStopping(patience=10, restore_best_weights=True),
tf.keras.callbacks.ReduceLROnPlateau(factor=0.2, patience=5)
]
history = model.fit(X_train, y_train_encoded, epochs=50, batch_size=32, validation_data=(X_val, y_val_encoded), callbacks=callbacks)
# Plot accuracy and loss graphs
plot_training_history(history)
Epoch 1/50 48/48 [==============================] - 3s 43ms/step - loss: 0.3735 - accuracy: 0.1078 - val_loss: 0.1974 - val_accuracy: 0.3172 - lr: 0.0010 Epoch 2/50 48/48 [==============================] - 2s 35ms/step - loss: 0.2284 - accuracy: 0.2630 - val_loss: 0.1604 - val_accuracy: 0.4552 - lr: 0.0010 Epoch 3/50 48/48 [==============================] - 2s 35ms/step - loss: 0.1890 - accuracy: 0.3708 - val_loss: 0.1444 - val_accuracy: 0.5448 - lr: 0.0010 Epoch 4/50 48/48 [==============================] - 2s 35ms/step - loss: 0.1648 - accuracy: 0.4247 - val_loss: 0.1416 - val_accuracy: 0.4897 - lr: 0.0010 Epoch 5/50 48/48 [==============================] - 2s 35ms/step - loss: 0.1468 - accuracy: 0.4832 - val_loss: 0.1265 - val_accuracy: 0.5379 - lr: 0.0010 Epoch 6/50 48/48 [==============================] - 2s 35ms/step - loss: 0.1326 - accuracy: 0.5122 - val_loss: 0.1174 - val_accuracy: 0.5793 - lr: 0.0010 Epoch 7/50 48/48 [==============================] - 2s 35ms/step - loss: 0.1245 - accuracy: 0.5404 - val_loss: 0.1195 - val_accuracy: 0.5793 - lr: 0.0010 Epoch 8/50 48/48 [==============================] - 2s 35ms/step - loss: 0.1145 - accuracy: 0.5648 - val_loss: 0.1166 - val_accuracy: 0.5931 - lr: 0.0010 Epoch 9/50 48/48 [==============================] - 2s 35ms/step - loss: 0.1092 - accuracy: 0.5694 - val_loss: 0.1139 - val_accuracy: 0.6000 - lr: 0.0010 Epoch 10/50 48/48 [==============================] - 2s 35ms/step - loss: 0.1041 - accuracy: 0.5865 - val_loss: 0.1075 - val_accuracy: 0.6000 - lr: 0.0010 Epoch 11/50 48/48 [==============================] - 2s 35ms/step - loss: 0.0998 - accuracy: 0.5963 - val_loss: 0.1044 - val_accuracy: 0.6069 - lr: 0.0010 Epoch 12/50 48/48 [==============================] - 2s 34ms/step - loss: 0.0903 - accuracy: 0.6180 - val_loss: 0.1066 - val_accuracy: 0.5931 - lr: 0.0010 Epoch 13/50 48/48 [==============================] - 2s 35ms/step - loss: 0.0897 - accuracy: 0.6193 - val_loss: 0.1010 - val_accuracy: 0.6345 - lr: 0.0010 Epoch 14/50 48/48 [==============================] - 2s 34ms/step - loss: 0.0855 - accuracy: 0.6167 - val_loss: 0.1022 - val_accuracy: 0.6414 - lr: 0.0010 Epoch 15/50 48/48 [==============================] - 2s 35ms/step - loss: 0.0820 - accuracy: 0.6298 - val_loss: 0.1069 - val_accuracy: 0.6207 - lr: 0.0010 Epoch 16/50 48/48 [==============================] - 2s 34ms/step - loss: 0.0795 - accuracy: 0.6377 - val_loss: 0.1043 - val_accuracy: 0.6483 - lr: 0.0010 Epoch 17/50 48/48 [==============================] - 2s 35ms/step - loss: 0.0772 - accuracy: 0.6292 - val_loss: 0.1033 - val_accuracy: 0.6414 - lr: 0.0010 Epoch 18/50 48/48 [==============================] - 2s 35ms/step - loss: 0.0727 - accuracy: 0.6417 - val_loss: 0.1047 - val_accuracy: 0.5931 - lr: 0.0010 Epoch 19/50 48/48 [==============================] - 2s 35ms/step - loss: 0.0708 - accuracy: 0.6568 - val_loss: 0.1028 - val_accuracy: 0.6276 - lr: 2.0000e-04 Epoch 20/50 48/48 [==============================] - 2s 35ms/step - loss: 0.0697 - accuracy: 0.6443 - val_loss: 0.1013 - val_accuracy: 0.6414 - lr: 2.0000e-04 Epoch 21/50 48/48 [==============================] - 2s 35ms/step - loss: 0.0679 - accuracy: 0.6456 - val_loss: 0.1010 - val_accuracy: 0.6138 - lr: 2.0000e-04 Epoch 22/50 48/48 [==============================] - 2s 35ms/step - loss: 0.0657 - accuracy: 0.6509 - val_loss: 0.1030 - val_accuracy: 0.6414 - lr: 2.0000e-04 Epoch 23/50 48/48 [==============================] - 2s 35ms/step - loss: 0.0675 - accuracy: 0.6509 - val_loss: 0.1031 - val_accuracy: 0.6345 - lr: 2.0000e-04 Epoch 24/50 48/48 [==============================] - 2s 35ms/step - loss: 0.0678 - accuracy: 0.6581 - val_loss: 0.1030 - val_accuracy: 0.6345 - lr: 4.0000e-05 Epoch 25/50 48/48 [==============================] - 2s 34ms/step - loss: 0.0688 - accuracy: 0.6548 - val_loss: 0.1020 - val_accuracy: 0.6483 - lr: 4.0000e-05 Epoch 26/50 48/48 [==============================] - 2s 34ms/step - loss: 0.0666 - accuracy: 0.6667 - val_loss: 0.1026 - val_accuracy: 0.6483 - lr: 4.0000e-05 Epoch 27/50 48/48 [==============================] - 2s 35ms/step - loss: 0.0673 - accuracy: 0.6568 - val_loss: 0.1025 - val_accuracy: 0.6483 - lr: 4.0000e-05 Epoch 28/50 48/48 [==============================] - 2s 35ms/step - loss: 0.0662 - accuracy: 0.6443 - val_loss: 0.1020 - val_accuracy: 0.6483 - lr: 4.0000e-05 Epoch 29/50 48/48 [==============================] - 2s 34ms/step - loss: 0.0647 - accuracy: 0.6568 - val_loss: 0.1020 - val_accuracy: 0.6483 - lr: 8.0000e-06 Epoch 30/50 48/48 [==============================] - 2s 35ms/step - loss: 0.0639 - accuracy: 0.6489 - val_loss: 0.1022 - val_accuracy: 0.6483 - lr: 8.0000e-06 Epoch 31/50 48/48 [==============================] - 2s 35ms/step - loss: 0.0669 - accuracy: 0.6443 - val_loss: 0.1022 - val_accuracy: 0.6483 - lr: 8.0000e-06
The plot here looks a lot better than the last plot, with the training and validation closer to each other. However, I haven't gotten around to evaluating it properly yet.